[BFT] Node ejected before epoch recovery #6632

durkmurder · 2024-11-08T14:16:46Z

Context

This PR makes improvements on code clarity based on parent branch and implements a new test which ejects a node before submitting the recovery transaction which leads to a scenario where the DKG committee is a superset of consensus committee when recovery takes place. To support this behavior some changes to the tooling were made as well as vote processing logic to account for inequality of random beacon and consensus committees

… committee instead of consensus committee

… Updated godoc for tests

…-eject-before-recovery-test

jordanschalm · 2024-11-12T19:16:13Z

cmd/bootstrap/run/epochs.go

@@ -55,9 +56,6 @@ func GenerateRecoverEpochTxArgs(log zerolog.Logger,

 	internalNodesMap := make(map[flow.Identifier]struct{})
 	for _, node := range internalNodes {
-		if !currentEpochIdentities.Exists(node.Identity()) {


This was originally put here to communicate an unusual state to operators when running the efm-recover-args command. I agree it should not cause an error, but maybe we can replace this with a WARN log. Then when it is run as part of the CLI tool we will still surface this info to operators.

d2448b9 (#6632)

jordanschalm · 2024-11-12T19:28:23Z

consensus/hotstuff/votecollector/combined_vote_processor_v2.go

+
+	publicKeyShares := make([]crypto.PublicKey, 0, dkg.Size())
+	for i := uint(0); i < dkg.Size(); i++ {
+		nodeID, err := dkg.NodeID(i)


In both instances where we currently are using DKG.NodeID, we are using it in order to construct a list of participant public key shares. Since the underlying DKG implementation (backed by EpochCommit) already contains this data in the DKGParticipantKeys field, why don't we replace the DKG.NodeID method with DKG.AllKeyShares(), that returns EpochCommit.DKGParticipantKeys?

Then, we can replace lines 63-74 with:

publicKeyShares := dkg.AllKeyShares()

Yeah, sounds good, I will add it.

6e2a798 (#6632)

jordanschalm · 2024-11-12T19:31:14Z

consensus/hotstuff/votecollector/combined_vote_processor_v3.go

 	// prepare the staking public keys of participants
 	stakingKeys := make([]crypto.PublicKey, 0, len(allParticipants))
+	stakingBeaconKeys := make([]crypto.PublicKey, 0, len(allParticipants))


Suggested change

stakingBeaconKeys := make([]crypto.PublicKey, 0, len(allParticipants))

beaconKeys := make([]crypto.PublicKey, 0, len(allParticipants))

In other parts of the code, we generally use the term "beacon key". Since what we are mainly differentiating it from is the "staking key", I think omitting "staking" from the name makes sense.

I think this is a special case, it's basically beacon keys that take part in staking. There is a separate beaconKeys down the line that is used only for random beacon protocol.

consensus/hotstuff/votecollector/combined_vote_processor_v3.go

jordanschalm · 2024-11-12T19:36:00Z

consensus/hotstuff/votecollector/combined_vote_processor_v3.go

 	}

 	stakingSigAggtor, err := signature.NewWeightedSignatureAggregator(allParticipants, stakingKeys, msg, msig.ConsensusVoteTag)
 	if err != nil {
 		return nil, fmt.Errorf("could not create aggregator for staking signatures: %w", err)
 	}

-	dkg, err := f.committee.DKG(block.View)
+	rbSigAggtor, err := signature.NewWeightedSignatureAggregator(allParticipants, stakingBeaconKeys, msg, msig.RandomBeaconTag)


Suggested change

rbSigAggtor, err := signature.NewWeightedSignatureAggregator(allParticipants, stakingBeaconKeys, msg, msig.RandomBeaconTag)

beaconAggregator, err := signature.NewWeightedSignatureAggregator(allParticipants, stakingBeaconKeys, msg, msig.RandomBeaconTag)

Nit: Reuse the "beacon" terminology from above, expand Aggtor so it's pronounceable. (Feel free to ignore if you disagree)

jordanschalm · 2024-11-12T19:45:23Z